Enriching SCFG rules directly from efficient bilingual chart parsing
نویسندگان
چکیده
In this paper, we propose a new method for training translation rules for a Synchronous Context-free Grammar. A bilingual chart parser is used to generate the parse forest, and EM algorithm to estimate expected counts for each rule of the ruleset. Additional rules are constructed as combinations of reliable rules occurring in the parse forest. The new method of proposing additional translation rules is independent of word alignments. We present the theoretical background for this method, and initial experimental results on German-English translations of Europarl data.
منابع مشابه
Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing
This paper studies two methods for training hierarchical MT rules independently of word alignments. Bilingual chart parsing and EM algorithm are used to train bitext correspondences. The first method, rule arithmetic, constructs new rules as combinations of existing and reliable rules used in the bilingual chart, significantly improving the translation accuracy on the German-English and Farsi-E...
متن کاملChart Parsing and Constraint Programming
In this paper, parsing-as-deduction and constraint programming are brought together to outline a procedure for the specification of constraint-based chart parsers. Following the proposal in Shieber et al. (1995), we show how to directly realize the inference rules for deductive parsers as Constraint Handling Rules (Frühwirth, 1998) by viewing the items of a chart parser as constraints and the c...
متن کاملAcquiring a Stochastic Context-Free Grammar from the Penn Treebank
In this paper we present preliminary results of investigating the structure of the Penn Treebank and how these results can be used in probabilistic parsing of English. Penn Treebank is a corpus of 4.9 million part-of-speech (POS) tagged words and 2.9 million words of skeletally parsed data developed by the University of Pennsylvania (see 8]). By matching skeletal parse les with POS-tagged les w...
متن کاملAn Extended GHKM Algorithm for Inducing λ-SCFG
Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e.g., logical form), has received increasing attention in recent years. While synchronous context-free grammar (SCFG) augmented with lambda calculus (λSCFG) provides an effective mechanism for semantic parsing, how to learn such λ-SCFG rules still remains a challenge because of the d...
متن کاملBilingual Markov Reordering Labels for Hierarchical SMT
Earlier work on labeling Hiero grammars with monolingual syntax reports improved performance, suggesting that such labeling may impact phrase reordering as well as lexical selection. In this paper we explore the idea of inducing bilingual labels for Hiero grammars without using any additional resources other than original Hiero itself does. Our bilingual labels aim at capturing salient patterns...
متن کامل